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Abstract We consider a simple mathematical model of gradual Darwinian evolu- 
tion in continuous time and continuous trait space, due to intraspecific competition 
for common resource in an asexually reproducing population in constant environ- 
ment, while far from evolutionary stable equilibrium. The model admits exact 
analytical solution. In particular, Gaussian distribution of the trait emerges from 
generic initial conditions. 



1 Introduction 

The question considered in this paper is: suppose a population evolves according 
to the Darwin's mechanism involving mutations and natural selection, and some 
of its quantitative traits change gradually, what is the rate of this gradual change? 
This question may be not the most interesting when applied to analysis of past evo- 
lution, say fossil records, where the epochs of such gradual changes are relatively 
short compared to much longer epoch when the species appear unchanged ( "punc- 
tuated equilibrium", Eldredge and Gould, 1972). However, the speed of evolution 
is crucial in constant competition of taxa ("Red Queen" hypothesis, Liow et al 
2011; "evolutional arms race", Dawkins and Krebs 1979), and is of considerable 
practical importance in relation to present day phenomena such as adaptation of 
pathogens to existing methods of treatment, or adaptation of endangered species 
to changing environmental conditions. 

The literature dedicated to this subject is vast. Here we mention here only 
some cornerstone publications, most relevant to the present communication, to 
designate its context and motivation. 

Quantitative approach to evolution dates back at least to Fisher's (1930) book, 
which contained his famous "Fundamental Theorem of Natural Selection" , stating 
that the rate of increase of the mean fitness of a population at any moment of time, 
attributable to to natural selection, equals the genetic variance of fitness of that 
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population at that moment of time. This result is as elegant and powerful as it is 
difficult to apply correctly, since its deceptively simple words encrypt complicated 
concepts, as it took nearly 40 years to figure out (Price, 1972). The next question 
is, of course, what determines this variance in the population fitness, and how to 
predict it. 

Adaptive dynamics is mainly concerned with qualitative questions such as di- 
rection of evolution, stability of evolutionary steady states and speciation due to 
branching (Geritz et al, 1998; Bowers, 2011). On the quantitative level, the funda- 
mental for adaptive dynamics is the "canonical equation" . The influential paper 
by Dieckmann and Law (1996) gives this equation in the form (for a single selected 
trait) 



where /i(t) is the average value of the trait at time t, r(x, /i) measures fitness of 
individuals with trait value x in the environment of resitent trait values n and 
the coefficient K{fi) is described as a "non-negative coefficient, . . .that scales the 
rate of evolutionary change" . Dieckmann and Law have endeavoured to derive this 
coefficient based on a certain miscroscopic model of mutation and selection pro- 
cesses, and come to the result that it is equal to the variance of the population with 
respect to the quantiative trait, which, when applied to the fitness, exactly repro- 
duces the Fisher's result. In their derivation, Dieckmann and Law made an essential 
assumption, with reference to separation of times between mutations and selection 
in the limit of slow evolution and the competitive exclusion principle, that at each 
moment of time, the selection reduces the population to a certain type, which how- 
ever changes in time due to random mutations ( "quasi-monomorphic framework" ). 
This framework, under the name of the "Trait Substitution Sequence" model, has 
been rigorously justified by Champagnat (2006), using a stochastic model, under 
certain asymptotic assumption about the mutation rate. Slightly simplifying, the 
key assumption is that mutations are so rare that for a given population size, 
there is sufficient time between consecutive mutations for the whole population 
to convert to the new trait value if it is fitter than the previous. They also con- 
sider the opposite limit which they call "large-population limit with accelerated 
births and deaths" , in which the population is so big and mutations so frequent 
that at any time there the population consists of many different types. This leads 
to a deterministic model in the form of an integro-differential equation, which is 
akin to Fisher's reaction-diffusion equation, only for population distribution in the 
trait space, and is a time-dependent variation of the Kimura (1965) model. Such 
deterministic models have been studied in many works, e.g. Calsina and Perello 
(1995); Gudelj et al (2006); Schuster (2011). These works typically concentrate on 
the analysis of stationary solutions corresponding to the evolutionary stable states, 
rather than quantifying the speed with which these states may be approached. 

Quantitative genetics has developed a number of its own approaches and investi- 
gated a great number of complicated problems related to quantitative description 
of evolution. One approach is through the method of moments. An example is the 
paper by Barton and Turelli (1987) which considered multilocus determination of a 
quantitative trait in a sexually reproducing population, and in particular presented 
an infinite chain of ordinary differential equations for the moments of allelic distri- 
bution. A more recent example is a paper by Sasaki and Dieckmann (2011), which 
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looks at multiple-peak character distributions, called "olygomorphs" , whereas each 
of the morphs is treated by the method of moments. The chain of equations for 
the moments is typically treated using a "closure" procedure, say by assuming 
that the distribution is Gaussian. Gaussian distributions naturally occurred in a 
number of studies, e.g. by Kimura (1965), Lande (1975) and Turelli (1984), as 
stationary solution at stabilizing selection. However, stabilizing selection is hardly 
relevant to description of intermediate states of continuing evolution; hence as far 
as we can see, the question whether and when Gaussian distribution may actually 
realize during such evolution remains open. 

Here we aim to analyze the speed of evolution while far from any evolution- 
ary stable state, based on the simplest possible meaningful model. This is a de- 
terministic integro-partial differential equation, which is similar to various forms 
postulated or derived elsewhere. We also provide a simple derivation of this model 
from "first principles" , avoiding to make non- verifiable assumptions, for fear that 
the ultimate results may become artefacts of any such assumptions. We consider 
single asexually reproducing species, stick with phenotypic description and use 
dynamic formalism, leaving all stochastics within the underlying population dy- 
namics model of intraspecific competition. As a model of intraspecific competition, 
we consider a predator-prey system where various predator populations depend on 
a common prey and do not interact otherwise. All these assumptions are admit- 
tedly rather restrictive; we believe that ab mitio approach not only helps to make 
clear what postulates the ultimate results depend on, but can also suggest a basis 
for subsequent generalizations for more realistic assumptions. The simplicity of the 
resulting mathematical model allows its exhaustive rigorous study. In particular, 
the Gaussian shape of the trait distribution does indeed emerge spontaneously. 
The practical utility of the model is illustrated by providing treatment of a more 
realistic model via asymptotic methods. 

The structure of the paper is as follows. Section 2 introduces the main equation 
and its variations. Section 3 describes the "normal solution" of the simplified 
version of the model, which underlies subsequent analysis. Section 4 goes on to 
consider the general solution of the simplified model. These results are extended 
to the more generic version of the model by means of a perturbation theory in 
Section 5. Section 6 is dedicated to the discussion of the results. We conclude 
with Appendix A with the derivation of our model "from the first principles" and 
Appendix B with the proof of the theorem presented in Section 4. 



2 The model 

We consider diffusive, mutational spread of a phenotypic trait distribution during 
a transient phase of optimizing evolution, described by a deterministic model of 
the form 



du , , ,w NN d 
- = ir{.)-N{t))u+- 



C{x)u + D{x) 



du 
dx 



N{t) 



u[x,t)ilx, (2) 



where a; is a continuous trait, u{x,t) > is population density in the trait space so 
that N{t) > is the total population at time i, r{x) is the fitness of trait value x 
measured as the low-density reproduction rate of the given type and corresponding 
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carrying capacity of the habitat, and C{x) and D{x) > represent mutations, with 
C{x) for the directed component and D{x) for the diffusive component. The case 
C{x) = D'{x) of this equation can be obtained as a special case of deterministic 
integro-partial differential equation (4.5) derived in Champagnat et al (2006) as 
a weak limit of a stochastic model, with convolution kernels U = const and V = 
const, (the "mean-field" case), and with an appropriate choice of functions b and 

d. A simple non-rigorous derivation of (2) through a continuous-trait limit of a 
deterministic population dynamics model is given in Appendix A. The model is 
admittedly rather simplified and ignores many important evolutionary factors, 

e. g. frequency dependent selection. Some equivalent forms of special cases of this 
equation found in literature will be mentioned below. 

We shall first look at solutions of (2) in simplifying assumptions regarding its 
coefficients, and then relax those assumptions by means of a perturbation theory. 
So, in the simplified version, we take that D{x) = D = const (which is a significant 
limitation as normally one would expect that mutation rate is proportional to birth 
rate so should be varying together with the reproduction rate r{x)). Further, the 
coefficient C{x) represents possible mutations bias with respect to the selected 
trait. In many studies it is assumed to be zero, however it may correspond to non- 
selective evolution, discussed e.g. by Koonin (2009). For the sake of simplicity, 
we take C = const; then without loss of generality we take C = 0, as a nonzero 
constant C would simply add — C to the trait change rate. So, 

oo 

^ = [r{x)-N(t))u + D^, N(t)= I u{x,t)dx. (3) 

— oo 

Function r{x) plays the role of the relative fitness of the subpopulation with trait 
value X. Considering this function a constant would not be appropriate as it would 
remove any selection; in our simplified version we take it to be a linear function 
r[x) = tq -\- r\x, where ro, r\ = const. This results in the equation 

oo 

^ = {ra+r^x-N{t))u + D^, N{t)= j u{x,t)dx, (4) 

— oo 

which is essentially identical e.g. to equation (2.1) postulated by Calsina and 
PereUo (1995). Substitution 

u{x,t)=N{t)p{x,t), (5) 

brings the evolution equation (3) to the well known (e.g. Taylor and Jonker, 1978; 
Page and Nowak, 2002; Karev, 2010) "replicator-mutator" form. 



% = irix)-mv^D'^^ 



{r{x)-f{t))p + D^, f{t)= I p{x,t)r{x)dx. (6) 



Note that by virtue of (5) and the definition of A*' in (3), we automatically have 

p{x,t) dx=l (7) 



/ 
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at any time. Function p{x,t) is the probability density function (PDF) of the 
population in the trait space x at time t, and r(t) is the mean fitness of the 
population at that time. If solution of the replicator- mutator equation (6) is known, 
then the total population size can be found by solving equation 

^ = {r{t) - N)N, (8) 

which then recovers a solution to the original evolution equation (3) via (5). 
A further substitution 

p{x,t) = P{t)v{x,t) (9) 
brings equation (3) to the form 

-=r{x)v + D^, (10) 
provided that the factor P satisfies 

oo 

P/P = -P / r{x)v{x,t)dx. (11) 



Hence the integral part of the equation uncouples from the differential, the closed 
linear differential equation (10) can be solved first, and the linearizing factor can 
be found afterwards via (12) which, for the initial condition P(0) = 1, gives the 
explicit expression 

(t oo \ ^1 

l + y j v{x,T)r{x)dxdT\ (12) 
-oo I 

(this is continuous-trait variant of linearization used by Schuster (2011)). In eco- 
logical terms, the linear equation (10) corresponds to the case when the subpop- 
ulations with different traits x are in no direct competition with each other or 
even within themselves and, aside from mutations described by the term Dvxx, 
each subpopulations multiplies by a Malthusian law with its own reproduction 
coefficient r(x). 

For the linear fitness function r{x), we have the alternative forms of (4) respec- 
tively as 

oo 

= {ro + rix-f{t))p + D-^, r{t)= r{x) p{x,t) dx, (13) 

— oo 

and 

_ = ^ro + nx)v + D^. (14) 

In the next two sections, we concentrate on the solution of equation (4) and its 
quivalent forms (13) and (14), before relaxing the simplifying assumptions in Sec- 
tion 5. 
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3 The normal solution 



We look for solutions that describe the dynamic change of the population during 
its gradual adaptation, while far from any evolutionary steady state. We note that 
equation (13) admits a family of exact self-similar solutions, which are Gaussians, 
or PDFs of normal distributions, 

1 r (x-j4tyf' 



p{x,t) = p*{x,t; fi,a) 



exp 



2a{ty 



(15) 



As is easily verified by direct substitution, function (15) is a solution of equation 
(13), provided that the parameters of the normal distribution obey the following 
system of ordinary differential equation: 

dF 



: ri a 



dt ' 

We shall call (15) a normal solution of (4). 
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Fig. 1 Establishment of the self-similar solution in (4) Profiles of population density in the 
trait space at selected moments of time during initial transient following a "triangular" initial 
distribution. The dashed lines are normal distributions corresponding to the solutions at chosen 
moments of time (with the same mean and variance). Parameters: ro = 1, ri = 1/3000, D = 1. 
Numerical simulation on the interval x £ [0, 3000] with Neumann boundary conditions (simu- 
lation with Dirichlet boundary conditions or wider interval produces indistinguishable results) , 
timestepping by forward Euler with second order accurate central difference for d'^u/dx'^ and 
trapezoidal rule for J udx, space step 1/4, time step 1/50. Shown is only the left end of the 
interval of x. 



We stress here that functional form (15) is not an arbitrary assumption, but an 
exact consequence of the evolution equation (4), once appropriate initial conditions 
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are supplied. These initial conditions should, of course, be Gaussian. However, 
numerical simulations shown in fig. 1 suggest that the general solution at arbitrary 
initial distributions asymptotically become normal as time increases, so the special 
class (15) should in fact be fully representative. This is indeed the case as we show 
below. 



4 General solution 

We formulate properties of the solution to (13) for a wide class of initial conditions, 
which generalize the properties of the normal solutions shown above. We do that 
in terms of the moments of the function p{x, t) considered as PDF of the random 
quantity a; at a given moment of time t. Namely, we use the mean, 

oo 

Mt)=/.P(.,Od., (17) 



and the variance 



a'^{t)= J {x- ii{t)fp{x,t)dx. (18) 



Theorem 1 Let p{x,t) be a solution of (13) with initial condition p{x,Q) = pq[x). 
We assume that that the initial condition satisfies 

oo 

(Al) J poix)dx = l, 

— oo 

(A2) po{x) IS analytical for X £ [a;_,a;+], 
(A3) po{x) =Oforx^ [x-,x+]. 

Then this solution can be written in the form 

p{x,t) = ^w(^—l^,t) (19) 



where w[-,t) is a PDF of a zero-mean, unit-variance distribution, and for all t > 
(CI) w[-,t) has moments of all orders, 

Mn{t) = I r,"-w{r,,t)dTi, \Mn{t)\<(x, ?i = 1, 2, 3, . . . , 

(C2) it converges (in distribution) to the PDF of normal distribution, 

lim w{rj,t) = e ^ 

t-i-oo ^2n 

(C3) the mean of PDF p{- ,t) varies according to 



f=n.=, (20) 
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(C4) and its variance according to 



da 
dt 



H Tjr-icr 7(f), 

a Z 



(21) 



where 'y{t) = M^{t). 

The proof is given in Appendix B. 

Conclusion (C3) states that equation (16a) remains exact in the general case. 
This is, of course, a special case of the generic Fisher-Price law (Fisher, 1930; 
Price, 1972) as applied to the current model. 

On the contrary, conclusion (C4) states that equation (16b) is not exact and in 
the general case requires a correction associated with the instant value of skewness 
7 of the distribution. However, (16b) is "asymptotically valid" for large t, as the 
skewness, according to (C2) vanishes in the limit t — oo. Moreover, from the proof 
we can see that the asymptotic order of 7 is such that the second term in (C4) is 
asymptotically smaller than the first term in the limit t — s> 00. 

5 Perturbation theory 

The previous results were for a simplified version of the model, where dependencies 
C[x), D{x) were replaced by constants and r{x) was replaced by a linear function, 
to allow a simple analytical solution. Now we would like to ensure that these 
solutions are not exceptional and small violation of the simplifying assumptions 
will not lead to completely different solutions. So, we now consider generic smooth 
dependencies for C{x), D{x) and r{x), but assume that the variation of x across 
the population at any given time is smaller than the typical scale at which these 
functions vary significantly. So, we develop a perturbation theory where the small 
parameter is cr, the standard deviation of the selected trait in the population. 
Admittedly u is not a constant but a dynamic variable; this however does not affect 
the formal asymptotic expansions. Equivalently one could use the inital value ctq as 
the small parameter. This, however, complicates notation, so we do it only in one 
place where such explicit treatment is essential. We require that functions C{x), 
D{x) and r(x) and the necessary number of their derivatives are bounded and 
that D{x) is everywhere bigger than some positive constant, so that \D'{x)/D{x)\ 
is bounded. We silently assume that all the moments are at most 0(1), before 
establishing their actual asymptotic orders more accurately. All results in this 
section are obtained formally, without any attempts of rigorous justification. 

We start from the integro-partial differential equation (2) and bring it to the 
form of replicator-mutator equation by substitution 



pt = {r{x)-r{t))p+[C{x)p + D{x)px]^, r{t) = r{x)p{x,t) dx. (23) 



Now we pass from the probability density p{x,t) to the normalized (zero-mean, 
unit-variance) probability density ui, through p's mean 



u{x,t) = N{t)p{x,t) 



(22) 



which gives 




(24) 
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and its variance 

a'^{t) = jix- fi{t)fp{x,t)dx, (25) 

via substitution 

P{x,t) = ^^w{ri,T), V = - — r = t. (26) 

(for chain rule differerentiation, it is convenient to distinguish the time variables 
in the old coordinates {x,t) and the new coordinates (ry, r); for functions of one 
variable such as a and /i this distinction is of course not important). Then 

Wr = — W+\— + jy— ] Wri + {r — f) W + Jr], (27) 
a \(J a j 

where J(77,t) = ^Cm + -^Dwn, C{ri,T) = C(/i(r) + crri), £1(77, r) = D (/i(r) + 0-77), 
a a 

r{rj,r) = r {^j,{t) + rja) , r(r) = J f{ri,T)w{r],r)drj. By construction, equation (27) 
is subject to constraints 

'' w{r,,T)dr^=l, (28) 



rjw{r],T)d7^ = 0, (29) 

Tf^ w{r],T) di] = 1. (30) 
Let us expand functions of x in Taylor series, 

riv,T) = XI —'^"'V'rn, rn = r„(r) = r'^"^(/x(T)), 



n'. 
n=o 



^("1'^) = E ^^"v'^Cn, Cn = C„(r) 4 C(")(/i(r)), 
n=0 

00 

-'^ — ^ n 



n! 

Tl = 



and consider the formal asymptotic expansion of equation (27) in the small pa- 
rameter a. 

It is straightforward to see that if J wd?] = 1 at r = 0, it remains so for all 
r > 0, so constraint (28) is always satisfied. The constraints (29) and (30) lead to 
asymptotic series for /i and 0. Further, multiplying both sides of equation (27) by 
r;", n > 3, and integrating over 77 leads to asymptotic series for the moments Mn- 
In this way, we obtain 



A = (Di - Co) + [ri + - icaj + i^r^ + ^D^ - j T'^' + O (a^) , 
7 = -SDoja-^ + 6Dia-^ + ^DaT'^" + O (a^) . (31) 
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So, in the selected asymptotic orders, of all the moments only the skewness 7 = M3 
affects the evolution speed. To see how big is its effect, we now need to consider 
a = a{t) as a function of time rather than merely a small parameter. To estimate 
the upper bound for 7(t), we consider 7 as a function of a, with the initial condition 
7(0) = 70, cr(0) = o-q: 

^ = 7/a = --37a-' + 6DiDo^ + O (a^ 

QO" V 



hence 



(T^7 - ao7o 



6 j (Di/Do)a'^da' + o(^a') , 



where the integrand Di/Dq depends on a via fi = fi{t) and t = t{a). Let us assume 
that 

\Di/Dq\ <G = const 
for the whole solution under consideration. Then it follows that 

3 



I7I < 170(^0/^)3] + -Ga + Op) 



So, our upper bound for skewness 7 consists of two components: one is related to 
the decaying contribution of the initial condition, \'yo{ao/a)^\ < \"/q\, and 7q((jo/(j)3 = 
O ^t~3/^) — >• as t — >• 00, and the other is the contribution of the perturba- 
tion, which is O {a). So if the effect of the initial skewness can be neglected, say 
70 = O (cr), then we have 7 = (a), and from (31) we have finally our asymptotic 
evolution equations 

A = {D'{^,) - CM) + (^r'(M) + iD"'(M) - ic"(M)) ^' + (a*) , (32a) 

& = D{ti)a-^ + (Id"{ii) - C'iii)^ a + O (a^) . (32b) 

This (asymptotically) closed system of ordinary differential is a generalization of 
the previous result (16) and it asserts that that simple system of two equations 
remains approximately valid even if fitness gradient and mutation diffusivity are 
not constant, only subject to the drift term due to mutation bias —C'{n), as we dis- 
cussed above. Moreover, it provides higher-order corrections to that simple system, 
if necessary. 

Further increase in asymptotic accuracy will require including into considera- 
tion higher-order moments. Let us consider higher-order asymptotic equations for 
a special case when D{x) = D = const, C{x) = C = const so only the fitness r{x) 
is trait-dependent. Then reasoning as before, we see that in this case 7 = (cr^), 
M4 = 3 -I- O ia^) , and therefore 



A = -C + r'{^,)a^ + \-^r"{^,)o^ + ir"'(M)a^ + O (a^) , (33a) 
a = Da-l + i7^'(M)'^' + \r" {^i)o'' + O (a^) . (33b) 
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The third term in the right-hand side of (33) (b) describes the effect on the variance 
of the stabilizing {r" < 0) or disruptive (r" > 0) selection. Note that at a fitness 
maximum, r' = 0, r" < 0, equation (33) (b) gives a stationary variance at = 

1 /2 

2_D/r-"(/i)) , in agreement with the classical result by Kimura (1965). 



6 Discussion 



We have considered solutions of a simple model of gradual Darwinian evolution 
in continuous phenotypicl trait space and continuous time, while far away from 
evolutionary stable equilibrium. 



analytic 



2000 



•R 1000 



I ' ' 1 

1000 2000 3000 

trait space, x 

Fig. 2 Accelerating evolution according to equation (4). Continuation of the same simulation 
as shown in fig. 1, on a larger scale. The solution of (4) is shown as density plot on the space- 
time plane. For comparison, the dashed line shows the corresponding solution x = fj,{t) of 
equations (16), with initial condition o"(0) = and /^(O) at the fittest edge of the initial 
condition of (4). 



In biological terms, the main predictions of the model are: 

— Equation (8) states that total population size is described by Verhulst dy- 
namics, where the mean reproduction rate f{t) is itself dynamically changing. 

— Equations (16a), (20), (32) (a), (33) (a) describe the dynamics of the mean trait 
value j-i{t), which has a selective component, proportional to the local fitness 
gradient r '(/i) and instant distribution variance o-^{t), and a non-selective com- 
ponent, due to mutation bias C{n). The selective component corresponds to the 
canonical equation (1) of the adaptive dynamics and to Fisher's Fundamental 
Theorem of Natural Selection. That is, not only population evolves faster in a 
stronger fitness gradient, but also populations which are more diverse with re- 
spect to the selected trait, evolve faster, and those that are very homogeneous 
in the selected trait, evolve slower. 

— Equations (16b), (21), (32)(b), (33)(b) describe the dynamics of the variance. It 
describes diffusive spread of the variance, in the long run by the law a'^ = 2Dt, 
where the mutations rate D plays the role of diffusivity. 

— The Gaussian distribution of the trait in the population emerges spontaneously 
during the course of evolution, as stated by conclusion (C2) of Theorem 1. This 
eliminates need for any artificial closing procedures in the moments equations. 
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The Gaussian distribution is preserved even if the gradient of fitness and the 
mutation parameters vary with the change of the trait coordinate, provided 
that their variation is slight within the spread of the population. 

Since the rate of evolution is proportional to the variance, and the variance in 
the long run is constantly growing, these result predict that the evolution has a 
propensity to accelerate, for as long as the assumptions of the model remain valid. 
This property is illustrated by a simulation presented in fig. 2. 

Our underlying assumptions are self-limiting. If coefficients C, D and r' are 
varying significantly on the trait space scale of Xc, then our perturbation theory 
will remain valid only for as long as the variance is small enough, cr^ <C x^. On 
the other hand, this variance grows unbounded, cr^ « 2Dt, hence this theory may 
be only valid for a limited time interval, t -C x^/{2D). However, this limitation is 
consistent with our goal which was to consider the transient phase of the evolution, 
before the population reached the optimum, as stationary distribution around the 
optimum is considered in other works. 

To conclude, we discuss the role of mutations, specifically the value of coefficient 
D in the above results. According to equation (20), the instantaneous rate of change 
of the average value of the trait does not depend directly on D, which is entirely 
consistent with the general Fisher-Price law which claims that it is determined only 
by intensity of selection and variance of the instant trait distribution. As to the 
variance itself, then according to (21) in the long run it always grows with the rate 
directly proportional to D. The convergence of the distribution shape to Gaussian 
also depends on mutation. According to estimates obtained in Appendix B, 7(4) a; 
Q/(r'j'-D^^^f®^^), t — >■ 00, where Q is a constant depending on initial conditions. 
Then, accepting smallness of skewness 7 as a measure of convergence to Gaussian 
distribution, we see that this convergence is due to both selection (ri) and mutation 
(D) and so will be very slow in the limit of ri — >• or D — s> 0. So, for a fixed time 
interval and very near to the fitness extremum, or for very small mutation rate, our 
results become inapplicable. This is of little surprise as mutation without selection 
as well as selection without mutation are completely different biological situations 
as well as completely different mathematical problems. The case of small mutation 
rate is considered in a recent paper by Sasaki and Dieckmann (2011), who start 
from selection without mutation and then add mutation as a perturbation, thus 
presenting an approach which is complementary to the one given here, and more 
appropriate for analysis near fitness extrema, as opposed to nearly-linear fitness 
landscapes considered here. 
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A Population dynamics origin for the model 

Many if not all individual ideas of our derivation are found in literature; however we could not 
find such derivation as a whole, so present it here in its entirety, step by step, emphasizing 
all assumptions made on the way, in order to identify the limitations of the resulting model, 
which may lead to ways to overcome these limitations. Wc start from the population dynamics 
model of Lotka-Volterra-Gause type, where the predator population consists of a number of 
subpopulations zj, differing in their parameters, all consuming the same prey (resource) R, 
mutating into each other, and not interacting otherwise: 

dR f ^ \ 

^ = Zj (-7j +5,R)+mj, i = ± 1, ±2, . . . , (34) 

where a is the low-density reproduction rate of the resource in absence of grazing pressure, fij 
describes the per capita grazing pressure by trait j on the common resource, the coefficient 
at the quadratic term in the first equation of (34) is unity due to the choice of the unit of 
measurement for _R, {—-fj + SjR) is the resource-dependent reproduction rate of trait j, and 
nij is the contribution of mutations, defined as 

nij = m+_j2j_i -I- m-^^2j+i - (m+ + mj) Zj . (35) 

Here we chose index j so it enumerates subpopulations monotonically with respect to the se- 
lected trait. The coefficients are the rates of mutation of subpopulation j that respectively 
increase or decrease the trait index j. So, in this model different types differ both by internal 
dynamics and by their interaction with the common resource, and their competition is indirect, 
only via dependence on the common resource. The population model with mutations (34,35) 
is admittedly very simple and ignores many biological aspects which can be very important in 
real life. For instance, it ignores frequency dependent selection, as the population dynamics of 
each type does not depend on its abundance relative to other types, but only on the "mean 
field", represented by the common resource. Also, we shall consider "smooth" dependence of 
the coefficients in this model on the one-dimensional index j, which will effectively impose 
one-dimensionality on the resulting model, which is also a significant restricting assumption, 
see Metz et al (2008). We stress however that our purpose here is not to derive a biologically 
realistic model, but only to present a simple example of how equations of evolution can be 
derived from population dynamics equations. 

For mathematical simplicity, we take the set of subpopulations {j} to bo infinite, which 
in practice only means that the overall diversity of the population observed during the stage 
of evolution under consideration is much smaller than all that are theoretically possible. Also 
for mathematical simplicity only, we take that mutations are so small they occur only be- 
tween subpopulations adjacent in the trait index j; this assumption is not essential: admitting 
mutations to any small distance in j leads to the same form of dynamic equation in the con- 
tinuous limit. What is important is that we neglect the probability of mutations that change 
the trait significantly, in comparison with the diversity of the population at any given moment 
of time; taking those into account would make the resulting equation a bit more complicated 
(mutations will be described by an integral rather than differential term, see e.g. Champagnat 
(2006)). There are of course also lethal mutations but they can be considered as contributing 
to the death rate and thus implicitly included in -yj . 

Our next assumption is that the dynamics of resource (prey) arc much faster than those 
of predators, so the resource concentration can be adiabatically eliminated. We find the quasi- 
stationary resource concentration as 

oo 

R = a- J2 (^J^j' 

j = -oo 

which gives the predators dynamics equations as 

dz I oo \ 

= zA aSj -ij-Sj ^ PjZj I +mj. (36) 
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We change the dynamic variables to measure the subpopulation sizes by their grazing pressure, 
Uj = fijZj, and turn to the continuous limit via x = jh, h 0, where a; is a physical 



measurement of the trait. This leads to the intcgro-differcntial equation 

fill ^ f^ti ^ f)'^ II r 

— = {f[x)-S{x)N{t))u + C{x)—+D[x) — , N(t)= J u(x,t)dx, (37) 



where 



u{x) = lim(hwj) , 5{x) = limSj, 
r(x) = lim (^aSj - -yj + m+_ ^ /3j _ i + mj^^^j/^j+i - m+ ~ mj^ , 
C{x) = limh/3j (^-m+_j//3j_i + mj"^j/^j+i^ , 

D{x) = lim h^l3, (m+_,//3,_i + mJ^J/S.+i) , 

and all limits are taken as h 0, j = x/h, x = const. In particular, we assume that dependence 
of mutation rate on j and the asymmetry between beneficial and deleterious mutation rates 
mj" — mJ are such that both limits C{x) and D{x) exist and D{x) 7^ 0. A change of variables 

D{x) = D{x), C{x) = C{x) - D'{x), r{x) = f{x) - C'{x), 
transforms equation (37) to 



^ = (rix)-Six)N{t))u+^ 



C{x)u + D(x) — 
dx 



00 

N{t) = J u{x,t)dx. (38) 



Finally, in this paper, we consider the case 5{x) = const, and we rescale population density 
so this constant is unity, which gives the model (2). In biological terms, this means that we 
assume that the competition between different types is in the speed of reproduction which is 
assumed strictly proportional to the carrying capacity, i.e. the equilibrium population density 
of trait X in the given habitat in absence of other types. This choice is arbitrary and is made 
from consideration of mathematical simplicity rather than motivated by any specific biolog- 
ical examples. We note however, that the effect of this, as well as many other simplifying 
assumptions made above, can be relaxed by perturbation theory, of the kind considered in 
Section 5. 



B Proof of Theorem 1 

First two brief preliminary remarks. 

— We shall use the identity (7) in our proof. As already noted, this is ensured by construction 
of the equation, and it also can be easily verified that once this property is satisfied for 
the initial condition according to (Al), it is then preserved by the equation (6). 

— We shall, without loss of generality, assume that a;_|_ = 0; otherwise, we just make trans- 
formation X X — Xj^. 

Note also that some of the variables used here coincide by name with variables used in 
Appendix A but have different meaning (m, 5). 

Proof 1° We start by proving (CI). We do that using the equivalent linear equation (14). 
First, we construct the fundamental solution of that equation, that is, a generalized solution 
V{x, t ; ^) with initial condition V(x, ; 5) = Six — ^), where 5() is Dirac's delta-function. This 
can be obtained from the normal solution p*{x, t; fJ.,a-) (15,16) with initial conditions /i(0) = ^ 
and cr[0) = 0, transformed by (9) and (12), which leads to 



V{x,t;0 



1 



cxp 



(ro + rii)t + 



4Dt 



(39) 
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Hence for the generic initial condition v{x,0) = po{x) we have 
v{x,t) = J V{x,t; OPo(e)dC 

— OO 

-4Ai?-(x-Ai-?)' 



/ 



Ki / cxp 



(40) 



where 



■■ 2Dt, 



p, = nDt^ 



are some linown functions of time. Normalization of this solution gives the PDF 'p(-,t), the 
moments of which arc found as 



^ L/Iq, 



oc oo 



exp 



4ll^-{x-fl- 0' 



2(72 



Po(5)<i5 dx. 



The corresponding double integrals are absolutely convergent under the assumed properties 
of po- Furthermore, Iq > 0. Hence existence of moments of all orders at all t for the PDF p 
follows. In particular, the mean = Mi{t) and variance (7^(t) = M2{t) are defined for all 
4 > 0. The normalized PDF ui(»7, t) is obtained from p{x, t) via substitution x = fi + arj, and is 
zero-mean and unit-variance by construction. Its moments of all orders exist as they are the 
standardized moments of PDF p. This is the conclusion (CI) of the Theorem. 

2° Now that the existence of fj,{t) and a-{t) for all t > is established, we can proceed to 
prove the predictions (C3) and (C4) about their dynamics. To find the rate of change of the 
mean tJ.{t), let us first note that for a linear fitness function r{x), we have 



f{t) 



OO 



tq + rix) p{x, t) dx = ro -I- ri/x(t). 



(41) 



Now let us multiply both sides of (13) by x and integrate them over a; 6 R. This gives 



d/i 
dt 



/dp f f d^p 

X — dx = / (rox + rix'^ — fx) pdx + D / x — r- da; 
dt J J dx^ 



= ro^ — r^ + T-i j x^pdx 

— oo 

where we used the definition of the mean (17) and the integral proprotional to D vanishes 
when integrated by parts (the limit limx^±^ [xdxp{x, t)] = can be verified using (40)). Now 
we do equivalent transformations in this equation, 



dfi 

— = roll -r/i + ri 
at 



oc oo oo 

J (x^ — 2x/i + fi^) pdx + 2rifj, J xpdx — r^fj^ j pdx 



oo 

/' 



ro^ — rn + ri I {x — ^)'^ p dx + 2rip? - rip? = ria'^ + {rop + ri^"^) - rfi 



where we have used definitions (7), (17) and (18). According to (41), the last two terms cancel 
out, which delivers the conclusion (C3) of the theorem. 
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Transformation (19) ensures some (already mentioned) identities for w{ri, t), whieh we will 
now need stated explicitly. Namely, 



/ 



w{t], t) dr] = I 



(42) 



follows from (7), 



follows from (17) and 



oo 

/ 

— OO 
OO 

/ 



rj w('q, t)dri = 



Tj w(r], t) dr] = 1 



(43) 



(44) 



follows from (18). Substitution of (19) into the linear evolution equation (13), with account 
of (42—44) and the already proved identity (C3), leads to the following differential equation 



dw 



at 



dw 



dr] 



— = ri(T r]w + — H [w + r)— H 



dw 



D ^2 



(45) 



Considering the first three moments of both sides of this equation, we see that the subspace 
of functions defined by (42—44) is an invariant set of this equation if and only if 



oo 

D 1 2 /■ ^ / N , D 1 o 

1 — rjcT / r] w(ri, t) dr] = 1 — ricr 7, 

(7 2 J cr 2 



(46) 



which is conclusion (C4) of the Theorem. 

3° It remains to prove (C2). We do it by the method of moments. An exact expression for 
the normalized PDF w(r],t) is obtained from (40) via substitution x = fi + crrj, where /i = ]i{t) 
and (T = a{t) are the true mean and standard deviation of the u-distribution (as opposed to 
the "rough guess" values of the same, /l(i) and (T(i)), and an appropriate normalization. This 
gives 

- ifli - [ar] + ]i - fL - tif- 



w(r],t) = K2 exp 



/ 



2(72 



Po(«)d«, 



where K2 is a coefficient depending only on t but not on r], chosen so that f wdr] = 1. 

—00 

The moments of PDF w are found as 



where 



00 / 00 



exp 



Mn = In/Io, 

4fl^ - jar] + fj. - p. - g)2 
2a^ 



(47) 



The already mentioned absolute convergence of double integrals (47) means that Fubini's 
theorem applies and we can change the order of integration. On doing so, and also introducing 
notations 

A = {^ + 11- ]i)/a, r]=-{z + A) 



we get 



d? 



d^. 
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where is the initial PDF, "biased" by selection towards fitter trait values: 

For brevity, we shall now omit dependence on time, until we start studying the t oo asymp- 
totics. By using the binomial formula, the moment integrals are rewritten as 



fc=0 ^ ^ 

/ ^.'»d. = |0' ^.,'1"^''°"''' (48) 
J \/27r I V"^ ~ ^)--' even ^ ' 



where 

- /2 f 
" ' ™ , I 0, if m is odd 



are the moments of the standard normal distribution, and 

k 



where, in turn, <p£ are the moments of the biased initial condition, 

oo oo 

1'l= I fmd(= J €'e-i*«po«)d5- (49) 

— oo — oc 

Combining these together, we obtain 

'-(f) E^sw:^r^(^) g ("-/') '^->-'*<- 

where [x] denotes the integer part of x. In particular, 



lo = -<?o, 



hence for the moments we have 



n I'V2] I , „_2fe n-2fc 



-I 



^" = Uj ]^^ i2mn-2ky. [^) <.o- 

These expressions can be used for determining the true mean and variance in terms of the 
initial PDF, via the identities (43) and (44). Namely, we have 

Ml = -i- [{p - ^J,)'Po + <2*l] = 0, 
a<Po 

hence 

11 = p. + 'Pi/'Po. (50) 

Then, 

Ma = [■P2 + S^-Po - <^?/<?o] = 1, 

hence 

= 5-^ +<P2/'Po-<l'l/'^l (51) 
With account of these, finally we have an exact formula for the moments of w in terms of the 
initial PDF po: 



(52) 
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So, the problem of the t — >■ oo asymptotics of the moments is reduced to asymptotics of 
^n. In this limit, we haye a = (2Dt)^'^ — > oo and p, = riDt^ — > oo. We also introduce s = rit 
for breyity, and s — > oo, too. 

In terms of s, integrals <?„ defined by (49) are, up to the signs, the bilateral Laplace image 
of the initial distribution po for n = 0, and deriyatives of that image for n > 0. Asymptotics 
of Laplace images are known to be yery sensitive to analytical properties of the originals. So 
at this point our assumptions {A2) and (A3) become essential. In accordance with the second 
preliminary remark, we set x+ = and x— = —W, W > 0, without loss of generality. Then 
the initial PDF has the form: 

Po(.)=L?o'^-^'"'^^t-^'°l' (53) 
[O, x^[-W,0]. 

Then for the biased moments we have asymptotic series 

^,= £a„.(-l)-+^g^+e.s.t. (54) 

m — 

where e.s.t. stands for "exponentially small terms", that is terms O (e"^") for any A £ (0, W). 

Let g > be the smallest power in scries (53), i.e. am = for m < q and aq ^ 0. Then 
from (54) we have 

•^t = {-lY+'aqie + qy.s-'-"-^ {i + o . 



'Pi/'Pa = "{q + l)s-^ (1 + O (s-i)) = O (t-i) 

H = p. + O {t'^) , (55) 
<2>2/<?^0 = {q + 2){g + l)s-^ {l + O (s"!)) = O (f-^) 



In particular, 
so (50) gives 
and further 
so (51) gives 

^2=^2+0(4-2). (56) 

That is, the "crude guesses" do in fact give asymptotically correct predictions of the true mean 
and true variance. This is only because we have chosen = 0, more about it later. 
For ^ > 2 we have 

Substituting these results into (52) we get, after some transformations, 

M„ = (1 + O (-l)"n! g (^) (57) 

where 

Em = ± + f^y. (58) 

Here in the limit t oo, the dominant terms are those with the largest k such that 
En— 2k 7^ Oi a-nd all others will be subsumed by the factor (l + O (s^^)). For odd n, we have 

El = 0, Eg = -ig!(g+ l)-2^ 

M„ = ""("3-;]!^^^ (1 + O = O (t-/2) . (59) 

For even n, we have Eo = q\ which gives 

M„ =(n - 1)!! (1 + (i-i)) . (60) 
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Summarising, we have for all n that 

whore An arc the moments of the standard normal distribution, (48). Since the normal dis- 
tribution is uniquely characterized by its moments, convergence of w in moments here implies 
convergence of distributions, and we have the claimed (C2). □ 

Two final remarks. 

— Equations (55) and (56), meaning that the "crude guesses" p, and a happen to be asymp- 
totically accurate for the true and a, are a direct consequence of the choice xj^ = 0, 
as can be verified by repeating the calculations with generic There is a simple inter- 
pretation of this fact in biological terms. In the initial PDF po{x) there is only a finite 
range x S [a:_, x+] of traits present, and the descendants of individuals with different traits 
make different contributions to the overall population at different times; but in the present 
model, the distribution of traits in the population in the long run is such as if descendants 
of the fittest ancestors, x = xj^, dominated in it, regardless the effect of mutations. Math- 
ematically, this is eventually down to linearity of (10). With initial conditions that are 
not finite-supported, the problem becomes significantly more complicated: the "dominant 
ancestor" trait will keep changing with time. 

— According to (59), we have 7 = (t~^/^'), so the related correction ^ricr'^j in (C4) is 
O which is asymptotically smaller than the main term D/a, which is O (t^"'''^). 



